bleu score
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > Middle East > Oman (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Germany > Berlin (0.04)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)
- Europe > United Kingdom > Wales (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (6 more...)
- Asia > China > Beijing > Beijing (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands (0.04)
- Asia > Uzbekistan (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
When does label smoothing help?
Rafael Müller, Simon Kornblith, Geoffrey E. Hinton
To explain these observations, we visualize how label smoothing changes therepresentations learned bythepenultimate layerofthenetwork. We show that label smoothing encourages the representations of training examples from thesame class togroup intight clusters. This results inloss ofinformation inthe logits about resemblances between instances ofdifferent classes, which isnecessary for distillation, but does not hurt generalization or calibration of the model'spredictions.
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation
Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu
Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures. In this paper, we propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer,gradually from lowleveltohigh level.
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > Texas > Travis County > Austin (0.05)
- Europe > Germany > Berlin (0.05)
- (9 more...)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > China > Hong Kong (0.04)
- North America > Dominican Republic (0.04)
- (5 more...)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- North America > Canada (0.04)